[Optimize] Improve performance like/not like filter through pushdown function to storage engine #10355

compasses · 2022-06-23T04:11:29Z

Proposed changes

Issue Number: close #xxx

Problem Summary:

Describe the overview of changes.
In order to improve to improve the performance of like/not like string matching, this PR would pushdown the function to storage engine. Test shows it can get 2x-3x performance gain.

select sum(lo_quantity),sum(lo_extendedprice),lo_orderdate from lineorder where lo_orderpriority like '%MED%' group by lo_orderdate order by lo_orderdate;

vectorized: 
before: ~0.6s,  after: ~0.3s

not vectorized:
before: ~3s,  after: ~1.2s

data filtered in segment reader：
![image](https://user-images.githubusercontent.com/10161171/175207195-b01aa536-f346-42dd-99c5-db395cb78f73.png)

Checklist(Required)

Does it affect the original behavior: (No)
Has unit tests been added: (No Need)
Has document been added or modified: (No Need)
Does it need to update dependencies: (No)
Are there any changes that cannot be rolled back: (No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

…e engine

be/src/vec/exec/volap_scan_node.cpp

compasses · 2022-06-25T09:54:41Z

@mrhhsg Hi, can you help check the changes as below, involved from your commits recently, which will lead to the function pushdown not work in this PR in vectorized mode.

BTW, the pushdown function of like / not like can get 2x performance gain in vectorized execution engine.

bool SegmentIterator::_can_evaluated_by_vectorized(ColumnPredicate* predicate) {
    auto cid = predicate->column_id();
    FieldType field_type = _schema.column(cid)->type();
    switch (predicate->type()) {
    case PredicateType::EQ:
    case PredicateType::NE:
    case PredicateType::LE:
    case PredicateType::LT:
    case PredicateType::GE:
    case PredicateType::GT: {
        if (field_type == OLAP_FIELD_TYPE_VARCHAR || field_type == OLAP_FIELD_TYPE_CHAR ||
            field_type == OLAP_FIELD_TYPE_STRING) {
            return config::enable_low_cardinality_optimize &&
                   _column_iterators[cid]->is_all_dict_encoding();
        } else if (field_type == OLAP_FIELD_TYPE_DECIMAL) {
            return false;
        }
        return true;
    }
    default:
        return false;
    }
}

mrhhsg · 2022-06-25T16:43:11Z

@mrhhsg Hi, can you help check the changes as below, involved from your commits recently, which will lead to the function pushdown not work in this PR in vectorized mode.

BTW, the pushdown function of like / not like can get 2x performance gain in vectorized execution engine.

bool SegmentIterator::_can_evaluated_by_vectorized(ColumnPredicate* predicate) {
    auto cid = predicate->column_id();
    FieldType field_type = _schema.column(cid)->type();
    switch (predicate->type()) {
    case PredicateType::EQ:
    case PredicateType::NE:
    case PredicateType::LE:
    case PredicateType::LT:
    case PredicateType::GE:
    case PredicateType::GT: {
        if (field_type == OLAP_FIELD_TYPE_VARCHAR || field_type == OLAP_FIELD_TYPE_CHAR ||
            field_type == OLAP_FIELD_TYPE_STRING) {
            return config::enable_low_cardinality_optimize &&
                   _column_iterators[cid]->is_all_dict_encoding();
        } else if (field_type == OLAP_FIELD_TYPE_DECIMAL) {
            return false;
        }
        return true;
    }
    default:
        return false;
    }
}

@compasses I am not sure, this logic should be just the same as the previous.

compasses · 2022-06-28T02:20:07Z

Ok, before the like predicate goes to the _short_cir_eval_predicate and now it goes to _pre_eval_block_predicate , I just make the like predicate support both way.

@Gabriel39 Hi could you help review this PR, and hope it can be merged ASAP. Cause I concern it will lead to conflict to other PR, and I need keep merging to fix them :).

…ction conjuncts

Gabriel39

LGTM

github-actions · 2022-07-05T12:14:35Z

PR approved by anyone and no changes requested.

be/src/olap/like_column_predicate.h

be/src/exprs/function_filter.h

be/src/olap/like_column_predicate.cpp

be/src/common/config.h

yiguolei

LGTM

…function to storage engine (apache#10355) * support like/not like conjuncts push down to storage engine * vectorized engine support like/not like conjuncts push down to storage engine * support both evaluate and evaluate_vec method in like predicate * reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts * change #ifndef to pragma once as per comments * change enable_function_pushdown default to false Co-authored-by: heguangnan <heguangnan@bytedance.com>

function pushdown: #10355 NGram BloomFilter Index apply like pushdown: #11579 Enabled by default, make sure it stays active. If NGram BloomFilter Index is not used, this like pushdown can be replaced by #15917, which can push down all expressions including like.

function pushdown: apache#10355 NGram BloomFilter Index apply like pushdown: apache#11579 Enabled by default, make sure it stays active. If NGram BloomFilter Index is not used, this like pushdown can be replaced by apache#15917, which can push down all expressions including like.

compasses and others added 2 commits June 21, 2022 06:19

support like/not like conjuncts push down to storage engine

6ee39a8

vectorized engine support like/not like conjuncts push down to storag…

e273f45

…e engine

github-actions bot added the area/vectorization label Jun 23, 2022

Gabriel39 requested changes Jun 23, 2022

View reviewed changes

be/src/vec/exec/volap_scan_node.cpp Outdated Show resolved Hide resolved

be/src/vec/exec/volap_scan_node.cpp Outdated Show resolved Hide resolved

be/src/vec/exec/volap_scan_node.cpp Outdated Show resolved Hide resolved

be/src/vec/exec/volap_scan_node.cpp Outdated Show resolved Hide resolved

compasses added 2 commits June 25, 2022 17:43

Merge branch 'master' into support-like-pushdown

958287a

fix conflict and changes as per comments

b3ee1d9

compasses changed the title ~~Improve performance like/not like filter through pushdown function to storage engine~~ [Optimize] Improve performance like/not like filter through pushdown function to storage engine Jun 25, 2022

support both evaluate and evaluate_vec method in like predicate

920819e

reuse remove_pushed_conjuncts and prevent logic error during move fun…

71abe4d

…ction conjuncts

Gabriel39 previously approved these changes Jul 5, 2022

View reviewed changes

github-actions bot added the reviewed label Jul 5, 2022

yiguolei reviewed Jul 5, 2022

View reviewed changes

be/src/olap/like_column_predicate.h Outdated Show resolved Hide resolved

yiguolei reviewed Jul 5, 2022

View reviewed changes

be/src/exprs/function_filter.h Outdated Show resolved Hide resolved

change #ifndef to pragma once as per comments

f3a502b

compasses dismissed Gabriel39’s stale review via f3a502b July 7, 2022 07:27

compasses mentioned this pull request Jul 10, 2022

[Feature] Add NGRAM bloom filter index to speed up like queries. #10733

Closed

3 tasks

yiguolei reviewed Jul 13, 2022

View reviewed changes

be/src/olap/like_column_predicate.cpp Show resolved Hide resolved

yiguolei reviewed Jul 13, 2022

View reviewed changes

be/src/common/config.h Outdated Show resolved Hide resolved

compasses added 3 commits July 16, 2022 14:32

change enable_function_pushdown default to false

d13858d

Merge branch 'master' into support-like-pushdown

75249e1

fix clang-format check issue

0df1540

yiguolei approved these changes Jul 19, 2022

View reviewed changes

yiguolei merged commit f6cb7a8 into apache:master Jul 19, 2022

xy720 mentioned this pull request Jul 19, 2022

[Compile] fix compile fail on clang #11019

Closed

mrhhsg mentioned this pull request Aug 9, 2022

[fix](like-predicate) Add missing functions in LikeColumnPredicate #11631

Merged

13 tasks

xinyiZzz mentioned this pull request Feb 26, 2023

[fix](scan) Default enable function(Like) pushdown #17154

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimize] Improve performance like/not like filter through pushdown function to storage engine #10355

[Optimize] Improve performance like/not like filter through pushdown function to storage engine #10355

compasses commented Jun 23, 2022

compasses commented Jun 25, 2022 •

edited

Loading

mrhhsg commented Jun 25, 2022

compasses commented Jun 28, 2022

Gabriel39 left a comment

github-actions bot commented Jul 5, 2022

yiguolei left a comment

[Optimize] Improve performance like/not like filter through pushdown function to storage engine #10355

[Optimize] Improve performance like/not like filter through pushdown function to storage engine #10355

Conversation

compasses commented Jun 23, 2022

Proposed changes

Problem Summary:

Checklist(Required)

Further comments

compasses commented Jun 25, 2022 • edited Loading

mrhhsg commented Jun 25, 2022

compasses commented Jun 28, 2022

Gabriel39 left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 5, 2022

yiguolei left a comment

Choose a reason for hiding this comment

compasses commented Jun 25, 2022 •

edited

Loading